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EXTENDED  PINCUS  THEOREMS  AND  CONVERGENCE 
OF  SIMULATED  ANNEALING 


Abstract 

Pincus'  1968  formula  for  the  (unique)  global  minimum  of  a  continuous  function  on  a 
compact  set  in  E''  is  extended  to  finite  muitiple  optima  and  to  discrete  and  speciai 
variants.  The  impact  of  these  on  associated  ergodic  irreducibie  aperiodic  Markov  chain 
computation  he  initiated,  (1970)  currently  called  "simulated  annealing”,  is  exempiified 
and  assessed  ieading  to  grave  concern  about  what  current  simulated  annealing  processes 
may  converge  to  instead  of  optima. 
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EXTENDED  PINCUS  THEOREMS  AND  CONVERGENCE 
OF  SIMULATED  ANNEALING 
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Introduction 

In  1968,  M.  Pincus  [1]  derived  a  closed  form  expression  for  the  (assumed)  unique  global 
minimum  point  of  a  continuous  function  on  a  compact  set  S  in  as  the  limit  of  the  expected  value 
(vector)  of  a  one  parameter  family  of  distributions  based  on  the  continuous  function  F(x).  Since 
these  expressions  involve  integrations  over  irregular  sets  and  in  the  many  variables  (of  E'^),  he 
established  in  1970  [2]  an  approximate  computational  method  based  on  the  Metropolis-Ulam,  von 
Neumann,  et.  al  method  [3]  of  associating  an  irreducible  aperiodic  Markov  chain  with  the 
approximations  to  Riemann  integrals  and  using  the  strong  law  of  large  numbers  for  such  chains  to 
generate  an  approximation  to  the  unique  global  solution  point.  The  relevance  of  this  work  to  so- 
called  "simulated  annealing"  as  introduced  in  1983  by  Kirkpatrick,  Gelatt  arxl  Vecchi  [4]  was  not 
noticed  by  these  authors  since  there  is  no  mention  of  the  Pincus  work  therein.  Nor  is  there  any 
reference  to  it  in  current  1988  work  of  Hajek  [5]  or  by  Chiang  and  Chow  [6]. 

The  analytic  sharpness  of  the  Pincus  type  formula  is  therefore  here  extended  to 
situations  of  finite  global  optima,  in  several  variants,  thereby  providing  insight  into  what  simulated 
annealing  or  sample  Markov  chain  computations  may  converge  to,  both  for  finite  discrete 
optimization  problems  and  for  continuum  problems  of  non-convex  type  which  Pincus  seemed  to 
have  in  mind. 

The  basic  results  are  that  the  limit  vector  in  Pincus-type  fonnulas  is  a  convex  combination 
of  the  global  minimization  vectors.  The  convex  proportions  depend  on  the  relative  function 
shapes  and  volumes  in  the  immediate  neighborhoods  of  the  individual  global  minima. 

In  the  following  we  present  an  extended  Pincus  theorem  and  an  extended  discrete 
Pincus  theorem  as  two  extremes  of  a  possible  general  Pincus  formula  which  would  involve  a  sum 


of  multiple  Integrals  of  different  dimensions  corresponding  to  different  parts  of  the  domain  S  of 
the  function  F  whose  global  optima  are  sought.  We  proceed  to  examine  for  these  implications  of 
facts  like  the  values  of  the  function  F  are  only  known  as  one  performs  sample  jumps  on  Markov 
chains  whose  transition  probabifities  are  determined  only  as  one  makes  sample  jumps  from  a  pre¬ 
selected  irreducible  symmetric  aperiodic  Markov  transition  matrix. 

Such  facts  raise  considerable  concern  about  what  it  is  that  a  simulated  annealing  process 
may  converge  to  instead  of  a  global  optimum  of  the  original  problem.  Indeed,  our  analytic 
expressions  give  additional  support  to  the  lack  of  assurance  simulated  annealers  have  in  their 
results  as  exemplified  by  "re-annealing”  and  "annealing  schedule”  change  devices  introduced  in 
the  hope  of  obtaining  "better  optima. 


Extended  Pincus  Theorems 

In  [1]  M.  Pincus  gave  a  closed  form  expression  for  the  unique  global  minimum  vector  of  a 
continuous  function  F(x)  over  a  compact  set  S  in  E'^  as  the  limit  of  a  sequence  of  expected 
value  vectors.  We  here  extend  this  result  to  the  case  of  finite  global  minima  for  two  extreme 
cases,  one  in  which  S  is  a  discrete  set  of  points.  The  proof  arguments  involve  breaking  the 
integrals  or  sums  involved  into  two  parts,  one  in  the  neighborhood  of  a  giobal  optimum  and  the 
other  negligible  for  large  parameter  values. 

Theorem  1: 

Let  (1)  F(x)  be  continuous  over  a  compact  set  S  c  E'' 

(2)  The  global  minima  0*1 . k  of  F(x)  are  contained  in  disjoint 

closures  Oa  of  open  sets  Oq,  with  6a  c  S  . 


Then  there  exists 
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|Xr|  ,  0  ^  Xf  — >  oo  end  0  ^  |Ja  ■  51  Ma  *  1 

a»1 

such  that 

L  f  Xi  exp  (-Ar  F(x))  dx  /  [  exp  (-A,F(x))  dx  =  f  MaZ".  i  =  1,  2 . n. 

Xr  oo  JS  '  JS  0-1 

Proof:  S  may  be  divided  into  compact  sets  Sa  <x>ntaining  Oa  and  disjoint  except  for  boundaries 
of  n-dimensional  volume  zero.  Thus 

S-ua.  and 

for  integrals  with  n-dimensional  volume  measure. 

LetG(x,X)»exp  (-XF(x)).  Then 

(1)  XjG(x,x)dx/[  G(x,x)dx  =  5],  XjG{x,x)dx/[  g{x,x)cIx 

.  S  Js  o  Js«  Js 

Consider  Sq  in  which  za  is  the  only  global  ntinimum. 

Now 

f  » 

(2)  ®  dx  -  Zj®  g(x,  x)  dx  +  (x,  -z j“)  G (x,  x)  dx 

.  Sa  .  Sa  .  Sa 

and 

(21)  [  [  ’O  G  (^'  dx  =  z  “p{a ,  x)  +  [  G  (x,  x)  dx  {xi -zf)  G  (x,  x)  dx 

Us  J  Jsa  Us  J  Js« 

Where  p (a, x)  =  [  g(x, x)  dx /x  f  g{x, x)  dx . 

JSa  '  p  JSp 

Evidently  0<p(a,x)  and  for  all  X. 

a 


$ 


4 


We  break  the  integration  over  Sa  in  the  iast  term  into  two  parts  which  go  to  zero  as  the 
first  part  voiume  does  and  as  X-400  in  the  second  part. 

Let 

lsC>(x;jx-zo|<£|  for€>0,€<py3  where  p  is  the  minimum  distance  between 
the  2“  of  S. 


(4.2)  h(x,A,)c1x  ^  exp(-X8)  V{Sa) 

JSa-N‘ 

where  V{S“)  is  the  volume  of  Sa. 

Since  F(x)  -  F(z“)  is  continuous,  for  x  e  N?{  =  (x  :  |x-z“|  <  11}  , 

F(x)  -  F(z“)  <  5  /2.  Hence 

(4.3)  H(x.x)dx  >  H(x,x)dx  ^  V{n^)  exp  {- X5/2) 

JS  Jn? 

and  (4)  is  bounded  atx)ve  by 

(4.4)  V(Sa)  exp  (-  A8/2)/  V(N^) 

Thus 

[  G{x,x)dx  f  XI  g{x,x)  dx  -  zS*  p(a,x)  s  6  +  exp(-X8/2)  Mo  V(Sa)  /  Y  (n^) 
s  J  JSa 

which  goes  to  zero  as  e^O  and  X -»  «» . 

Since  0  <  p  (a,X)  and  forall  X  ,  every  sequence  {Xn}  <>0  containsa 

subsequence  for  which  p  (a,\)  converges  to  some  p  (a)  ,  0  <  p  (a)  ,  =  I  • 

O 

Note  that  p  (a,  X)  and  p  ( a )  are  independent  of  the  coordinate  xj  . 

Therefore,  going  back  to  (1),  every  non-negative  sequence  {x}  «  contains  (b.a.o.n.)  a 


subsequence  { \  }  such  that 
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(6) 


L 

Xr-*oo 


Xi  G  (x,  x)  dx 
's 


etc 


aal 


Q.E.D. 

In  R'rxxjs' fornula,  k  -  1,  |j.  (a,  X)  s  1  ,  so  that  no  subsequence  is  necessary. 

The  restriction  to  a  subsequence  of  X  oo  can  be  removed  sometimes  by  additionai 
special  conditions  on  F(x)  and  the  properties  of  S  in  the  neighborhood  of  the  conditions 
which  may  be  fulfiiied  in  some  optimization  applications  but  not  ones  with  F(x)  non-convex  at  a  z". 

Some  Examples 

Rrst,  however,  to  give  some  feeiing  for  the  convergence  situation,  we  consider  severai 
one-dimensionai  expiicitly  integrabie  exampies  as  in  Rgure  1 : 

FIGURE  1 


Convergence:  A,  B  ~  5  Convergence:  C,  D  ~  8.77,  E  ~  1-  5 
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A:  F(x)-(x,  1 

|10-X,  5^x^9 

Here  Si  » {x :  1  sx s5} arxJ  S2  =  {x : 55 x s  10} 


(A.1) 


S, 


X  g(x,  x) dx  ■  X'^  [e"*-  +  0  (e-®^)]  ;  [  Ca(x.  x) dx  »  X~’  [e^  +  0  (e-®)] 

JSi 


verifying  =  1 . 


(A.2) 


X"’  [ge"*’  +  0  (e-®)]  :  |g^  g{x,  x)  dx  =  X~’  [e-*-  +  0  (e-®)] 


verifying  z^  «  9. 

Ciearly.  for  large  X,  }i(1 ,  X)  =  1/2  +  0(e"^^)  and  ^(2,  X)  » 1/2  +  0(e"^^) 
Thus,  as  X  ->  00  .  convergence  of  (1)  is  to 
(A.3)  2  ^  Ifi®  global  maximum  of  F(x)! 


Next  consider 


(B.1)  F(x) 


X,  1  ^  X  ^  2 
{16-x)/7,  2^x^9 


Then  }i(1 ,  X)  =  9/1 6  +  0(e~2^) :  }i(2,  X)  =  7/16  +  0{e“2X)  and  convergence  is  to 


(B,2)  (9/16)*1  +  (7/16)*9  *  5 


The  value  here  is  1 1/7,  over  50%  higher  than  the  minimum  of  1 .  Note  that  the  difference 
in  shape  and  behavior  of  F(x)  in  Si  and  S2  has  made  p  (1 )  ^  p  (2). 


(C)  F(x)-  4x,  1  sx^2 

13-x  .  3  ^x  ^9 


8 


Here  Sj  =  {x  :  1  ^  x  S  2} ,  Sj  =  {x  :  3  ^  x  S  9}.  S  is  not  convex  and  not  connected. 


In  obvious  abbreviation, 

^  ^  “  (4^)"’  [e-^  +  0(e-®-)]  ;  Gdx»  (4X)-’  [ff^  +  0(e-®^)] 
Jsi  ' 


verifying  =  1 . 


» 

2)  X  Gdx  -  9X'‘e^  +0(X-i-e'’“^  :  Gdx-  (x)-i  [e-^  +  0(e-i»)] 

JSs 


verifying  z^  -g. 

For  large  X,  |l(1,  X)  *  1/37  +  O(0“4^) ,  H(2,  X)  =  36/37  +  0(0“®^)  and  convergence  is  to 


1  •  (1/37)  +  9  •  (36/37)  s  8.77 

The  sharper  rate  of  rise  of  F(x)  and  mudt  smaller  volume  in  Si  re  S2  has  nearly  eliminated 
the  Si  contribution. 

D :  Extending  F(x)  through  2  ^  x  ^  3  by  F(x) «  4  +  2x  so  that  F  is  continuous  and  S  convex 

and  connected,  the  contribution  as  X, ->  00  is  0  (X*^  e-10^)  in  xGdx  andisO(X’^  e-8X)  in 

Js’ 

G  dx  so  that  convergence  is  to  the  same  values  as  in  example  C. 

.S’ 

E :  Extend  F(x)  to  F(x)  -  / 4x,  1  -6  s  x  s  2  to  have  a  unique  global  rrtninxjm  at  x  =  1  -5,  with 

4  +  2x,2  ^  X  ^3 
.13-x,  3^  X  ^  9 

Si  _  {x:  1-Ss  X  S  3}  and  S2  .  (x:  3  s  x  S  9}  arxi  value  4-4  5  for  0  <  8  very  small. 


Then 


(E.1) 


xGdx  / 
Si 


I  Gdx  =  [(1-8)  +  0(e-4(i-*)^]  /  [l  +  4e-S^  +  0(e-5^)] 
S 


and 


(E.2) 


I  X  Gdx  / 
S2 


Gdx  =  36e-4S>.  [1 .4e-45X] 
.S 


Thus  for  small  5,  it  is  only  for  large  X  that  the  unique  and  very  little  smaller  global  minimum  is 
picked  up  e.g.  that  (E.2)  -4O. 

Special  and  Discrete  Pincus  Theorem  Extensions 

Under  spedal  conditions  on  F(x)  and  S,  asymptotic  expansions  for  the  integrals  of 
Xi  G(x,  X )  and  G  (x,  X)  for  large  X  may  be  obtained  by  Laplace’s  method,  see  pages  36-37  of 
Erdelyi's  monograph  "Asymptotic  Expansions"  (7]  and  by  Hsu's  extensions  to  multiple  integrals 
[8],  [9].  We  use  only  the  G  (x,X)  =  exp  [  -X  F(x)  ]  cases,  since  it  is  only  these  in  which  we  might 
need  to  consider  a  subsequence  of  X  tending  to  infinity.  Thus,  for  us,  Hsu’s  results,  pages  629- 
30  of  [9]  for  unique  global  minimum  on  the  boundary  of  Sa  or  page  626  of  [8]  for  interior  global 
minimu.'n  of  F  (x)  in  Sa ,  would  give  us 

f  exp[  -X  F(x)  ]  dx  =  exp[  -F  (z“)  ]  { (27r)V  Hn [  -F  (z”)  ]| ?  _ 

S  V  /  » 

X  — ^  00 


where  Hp  [F(z<^)  ]  is  the  n-dmensional  Hessian  of  F  at  zo .  The  set  Sa  is  supposed  to  contain 
a  closed  finitely  connected  domain  with  continuously  turning  targent  hyperplane  which  includes 
za.  The  function  of  F  is  to  be  of  class  (continuous  with  continuous  first  and  second  partial 
derivatives ) .  The  Hessian  required  to  be  positive  implies  that  F(x)  is  strictly  convex  at  z« ,  a  most 


restrictive  condition.  Thus,  Pincus'  result  is  much  stronger  than  the  asymptotic  Laplace-Hsu 
results. 


Discrete  Case 

Consider  a  bounded  discrete  set  of  points  DCE*'  and  a  function  F(x)  defined  on  D 

and  with  global  minimum  there  at  z^,  a  » 1 . k. 

Theorem  2:  If 

(1)  F(x)  has  a  finite  number  k  of  global  minima  in  D  at  zo,  a^l . k. 

(2)  |0|  ^Vd  and  for  some  Ao  >  0, 


J  exp  [-XoF(x)]<«o 
xeD 

(3)  inf 

D^U(z“) 
a 

Then 

L  Z  xiexp[-XF(x)I 

A-xw  xeD 


rF{x)-F(z^J-5>0 


/  Z  exp{-AF(x)J-  k’  X  2f,  i=1 . n 

'  xeD  0-1 


Proof:  Let  G  (x.  a)  =  exp  [-X  F  (z«)],  H  (x,  x)  *  exp  [-  A  F  (x)  -  F  (z“)  ]  , 

Z  ■  U  {z“}  and  D\Z  =  d'. 
a 

Then 

5;xiG(x.A)/X  G{x.A)-  5^  x|H{x.X)/  £  H(x.A) 

'  '  xeO  xeO  xeO  xeD 


(7.1  )  - 

- 1 

z?+  X  Xi  H(x,X) 

/ 

'k+  X  H(x,X) 

L«-i 

xeD' 

xeD' 

Let  X  =  Xq  +  ^0-  Then 

(7.2  )  h(x.x)-  h(x.Xo)  exp[-XiF(x)-F(^]  5  exp  {-Xi  8)  H(x.Xo) 
So 

(8  )  S  S  H(x.X)) 

xeD'  xeD* 

(8.1)  X  XiH(x.X)s  VoX  ^  VDexp(-Xi5)X  H(x,X>) 

X£  D*  XE  D*  XE  D' 

(8.2)  L  S  H(x.X)  =  L  IxH(X.A)=0 

X— ><*»  xeD  X— »oo  jfeD* 

and  from  (7.1  ), 

,  .  k 

/Q  V  XxjG(x,X)/  X  G(x.X)  »k-’  £  zf,  i  -  1....,n. 

'  '  xeO  xeD  0^1 

Q.  E.  [ 

From  Optimization  to  Simulated  Annealing 


Pincus,  understanding  that  his  1968  results  {1]  could  not  be  made  operational  without  a 
means  of  approximately  evaluating  integrals  over  irregular  sets  and  in  multi-dimensional  spaces, 
provided  in  1970  [2]  a  method  of  computation.  Since  F(x)  was  continuous  on  a  compact  set  S, 
the  Riemann  integrals  could  be  approximated  by  summing  the  values  of  the  integrands  at  a  finite 
grid  of  points  in  S  each  multiplied  by  the  volume  of  its  grid  cell.  His  key  was  the  World  War  II  work 
of  von  Neumann,  Ulam,  Everett  and  Metropolis  (3]  as  partly  summarized  but  not  referenced  in 


Chapter  9  of  the  excellent  1964  Methuen  monograph  of  Hammersley  and  Handscomb,  Monte 
Carlo  Methods  flO]. 


An  Incisively  clear  rendition  of  the  backgrounds  and  basic  idea  of  associating  an  ergodic 
irredudble  (finite  state)  Markov  chain  satisfying  the  strong  law  of  large  numbers  so  that  a  sample 
sequence  of  Markov  transitions  could  give  with  probabiRty  one  an  expected  value  which  would  be 
a  desired  integral  can  be  found  in  The  Monte  Carlo  Method”  by  N.  MetropoRs  and  N.  Uiam  in  the 
1949  Journal  of  the  American  Statistical  Assodation.  pp.  335-341 .  [13].  This  paper  which 

references  other  von  Neumann,  Everett,  Ulam  work  makes  clear  (page  338)  that  the  method  had 
been  and  was  being  appRed  to  many  problems  involving  (partially)  stochastic  fiows  governed  by 
integro-differential  equations,  not  just  to  Boitzmann  equiRbria  in  "statistical  mechanics.” 

Ignorance  or  unavailability  of  these  references  Rkely  led  to  the  current  misapprehension  of  history 
as  in  the  preface  to  the  1987  book  [1 1]  of  van  Laarhoven  and  Aaits  Simulated  Annealing:  Theory 
and  Applications.  D.  Reidel  Publishing  Co.,  Dordrecht,  Holland. 

Pincus'  work  dealt  with  optimization  problems  with  a  unique  optimum.  His  expressions  for 
this  optimum  as  a  iimit  of  expected  value  expressions  led  him  to  make  the  Monte  Carlo  Markov 
chain  connection.  Our  results  for  non-unique  global  optima,  the  usual  case,  provides  insight  into 
convergence  properties  of  associated  Markov  sample  mns  designated  "simulated  annealing,” 
free  of  metaphysical  non-mathematical  ideas  of  melting,  cooRng  and  freezing. 

Markov  Chain  Computation  and  Simulated  Annealing 

Pincus  in  [2],  borrowing  from  [3],  assodated  with  each  point  x''  of  S  in  the  approximating 
Riemann  integral  sums 

(10.1)  £  Xi'  exp  [-XF(x")]  and  X  exp 

k.l  k-l 

The  N-vector  of  coordinates  (for  aR  X  >  0) 


(10.2)  Ilj  -  exp[-A,F|]  /  I  exp  [-XFk]  .  j-1,. 


N 
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where  Fk  ■  F(x*^)  and  X  is  omitted  where  dear  from  context.  The  nj>  0  are  to  be  the  (unique) 
"invariant"  or  "stationary"  dstribution  of  an  (ergodic)  irredudbie,  aperiodic  Markov  process  with 
N  "states"  (or  "configurations"  in  simulated  annealing  "SA"  nomendature)  and  matrix  (Py) ,  i,  j  » 
1,...,  N.  Starting  with  an  arbitrary  symmetric,  irredudbie,  aperiodic  Markov  matrix  Pjj  ,  Pjj  is 


determined  by 


(11)  Pij 


j  ^  /^i  ^  1>  ®  ^  j 


,  if  Hj  /Hj  ^  1,  1 j 


Pi*i+I  i-j.  J'-{j:nj/ni  <lj 

I  j€  J- 


The  Pj*j  are  designated  Gy  ("generation  probabilities")  in  SA  literature  which  has 

Pj  j  -  Gj  jAy  where  Aj  j  is  an  "acceptance"  probabinty  matrix,  see  [11],  page  13.  The  SA  literature 
also  uses  X'1  -  c,  orT,  so  that  as  X  -»  oe  ,  c,orT  (terrrperature)  0. 

The  Markov  (sample)  chains  are  defined  as  follows; 

If  currently  in  "state"  i,  use  the  P}|  (Gjj)  probabilities  to  pick  state].  Calculate  /  Ilj  .  If 

/  Ilj  ^  1  ,  accept  1  as  the  new  state.  If  Ilj  /  llj  <  1,  then  with  probability  /  Ilj  accept  j  and 
with  1  -  11]  /  Ilj  go  back  to  the  original  state  i.  This  procedure  corresponds  to  sample  chains 
using  Pjj .  Operationally ,  Ilj  /  II|  2 1  means  Fj  s  F| ;  q  /  III  <  1  means  Fj  >  Fj .  Acceptance 
of  j  is  (Metropolis  criteria)  by  comparing  exp  [-  X  F  j|  with  a  random  number  R  drawn  from  the 
uniform  distribution  on  (0,1].  Accept  if  exp  [•  X  F  j  ]  >  R,  othenwise  stay  at  i. 


How  long  a  Markov  chain  to  use  for  each  X  value  is  not  well  specified  in  SA,  see  [1 1] 
pages  59-61 .  but  the  idea  is  (page  10)  "until  equiiibifum  is  approached  sufficiently  closely", 
whatever  "equilibrium",  undefined  mathematically,  may  mean.  Supposing  it  means  getting  close 
to  optimum,  this  is  usually  impossible  to  estimate  or  else  because  of  horrendous  to  do 
computation  time.  Pincus  in  [2]  gives  a  rough  Chebychev  inequaiity  bound  re  convegence  to  the 
expected  value  for  the  case  of  a  unique  global  ittinimum.  A  similar  bound  is  mentioned  [1 1] 
page  56  without  reference  to  Pincus  and  for,  presumably,  the  general  multiple  optima  situation 
which  Hajek  [5]  and  others  expiidtly  mention.  All  [1 1 . 1 2]  seem  to  have  the  idea  that  one  has 
"convergence"  to  a  single  state  i.e.  that  a  unique  minimum  state  for  each  X*^  ("temperature") 
employed  is  achieved  in  computation  which  is  stopped  either  when  the  mininrujm  state  repeats 
itself  often  or  by  choosing  the  state  reached  when  a  fixed  number  of  transitions  is  reached.  Proof 
that  such  repetition  can  "unreasonably"  often  ocorr  (even  for  simple  heads-tails  coin  tossing)  may 
be  found  in  Feller  [13]  Chapter  III,  especially  pages  77-86,  re  the  Arc  Sine  law  and  Probabilities  of 
Long  Leads. 

Example  Computations 

Nearly  all  SA  computation  (and  much  theory)  has  employed  Pj*j  matrices  with  equal  Pj*j 's 

in  a  band  around  the  main  diagonal  e.g.,  uniform  non-zero  probabilities  between  "neighboring” 
states,  zero  between  non-neighboring  states  but  with  positive  probability  of  reaching  any  one 
state  from  any  other  in  possibly  multiple  steps  (the  Irreducibility"  property). 

Abstracting  3  states,  1 , 2, 3,  corresponding  to  x  -  1 , 5, 9  from  example  A  with  Fi=  l, 

F2=  5,  .F3=  1,  using  the  uniform  ( Pj  j  )  matrix  of  elements  Pj*j  -  1/3,  choosing  X  >  0.5 

(temperature  -  2)  and  making  a  sample  run  of  96  transitions  starting  with  state  1 ,  we  reached  state 
one  47  times,  state  two  7  times,  state  three  42  times.  No  sojourn  in  a  state  lasted  more  than  4 
times;  every  state  was  a  global  minimum  starting  with  transition  66  whereupon  the  transitions  were 
flip-flops  between  the  global  minimum  states  1  and  3  rather  than  an  "equilibrium"  at  either.  The 


relative  frequencies  of  0.4948,  0.0722,  0.433  for  respectively  states  one,  two,  three  correspond 
to  the  theoretical  values  of  1/2,  0, 1/2  from  our  "discrete”  Pincus  Theorem. 

We  next  consider  a  class  of  travelling  salesmen  problems  with  2n  cities  having 
obvious  global  optimal  tours  of  minimum  dstance  traveled.  The  city  locations  are  at  points 

(1. 1) , ....  (1,  n),  (n,  0), ....  (1, 0)  in  the  plane  with  "rectangular  (e.g.  "Ii")  distance  between 

pairs  of  cities.  We  label  the  tours  (or  configurations)  as  cyclic  permutations  (1 . 2n)  where 

(1. 1) , . . .  (1,  n),  respectivety  correspond  to  cities  1 . n  and  (n,  0) . (1,0)  correspond  to 

n  -I- 1 . 2n.  Clearly  the  global  optimal  tours  are  those  starting  at  any  city  and  going  clockwise 

(or  counter  clockwise)  around  a  "rubberband”  encircling  the  cities.  The  transitions  are  an 
exchange  of  a  pair  of  cities  in  a  permutation.  Thus 

(a)  There  are  4n  globai  optimal  tours 

(b)  (2n)l  tours,  with  many  local  minima 

(c)  The  a  priori  probability  of  findng  a  global  optimum  is  quite  small,  0  (4n  /  (2n) !  ) 

(d)  The  average  tour  length  is  0(  n^ 

(e)  The  optimal  tour  length  is  2n  <  <  0(n^. 

Starting  with  2n  «  500,  the  Kirkpatrick  et  al  "Metropolis”  algorithm  [  4  ],  the  tour  (1 . n, 

2n,  2n-1 . n  -I- 1) ,  we  made  a  number  of  mns  on  a  SEQUENT  computer  (roughly  equivalent  to 

a  VAX  780).  We  present  results  for  some  "high"  temperature  runs  using  1000  transitions  and 
"low"  temperature  runs  using  3000  transitions  — 1000  transitions  took  45  CPU  minutes  on  the 
computer. 

Notice  that  at  high  temperatures  the  "equilibrium"  tends  toward  the  average  case;  at  low 
temperatures  it  gets  stuck  in  local  minima. 


Simulated  Annealing  Runs 
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(500  Cities,  1 ,000  Transitions  per  run,  starting  tour  cost  998,  optimal  cost  500.) 
Temperature  ( V)  Cost  Last  Tour  Cost  Best  Tour  Found 


10.0000 

4494 

998 

(Casel)  1.0000 

3284 

998 

0.1000 

3260 

998 

(3,000  transitions  per  mn,  starting  tour  cost  998) 

1.0000 

1146 

998 

0.1000 

1014 

998 

(Case  1)  0.5000 

1016 

998 

0.1000 

996 

998 

1.0000 

1146 

998 

(Case  2)  0.1000 

1014 

998 

0.0100 

1014 

998 

1.0000 

1150 

998 

(Cases)  0.1000 

1040 

998 

0.0100 

1040 

998 

1.0000 

1128 

998 

(Case  4)  0.5000 

1024 

998 

0.1000 

1010 

998 

0.0100 

1010 

998 

1.0000 

1206 

998 

(Case  5)  0.1000 

1026 

998 

0.0100 

1026 

998 
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Individual  Sample  Runs  Versus  Ensemble  Properties 

Simulated  annealing  analogies  to  physical  orthemnal  equilibrium  statistical  mechanics 
confuse  between  average  properties  of  ensembles,  properties  of  averaged  behavior  of  many 
individual  particles,  with  that  of  individual  particle  behavior.  The  vast  difference  mathematically 
between  ensemble  properties  and  behavior  in  individual  sample  mns  is  brought  out  in  Feller  [13] 
Chapter  III,  "Coin  Tossing  and  Random  Walks."  Without  such  behavior  there  would  likely  be  no 
gamblers  nor  attempts  at  gambling  systems. 

In  our  first  SA  example  we  chose  only  three  states  and  96  transitions  in  order  to  ensure 
that  the  ergodicity  property  of  the  sample  Markov  chain  would  take  over.  As  mentioned,  the  latter 
two-thirds  of  the  transitions  involved  flip  flops  between  the  two  global  minima  states  (with  j]q 
visitation  of  the  poorer  state).  Is  this  equilibrium,  ttie  state  at  which  to  terminate  the  sample  run,  as 
undefined  in  von  Laarhoven  and  Aarts  [1 1]  espea'atly  page  10? 

In  the  second  example,  starting  with  a  state  close  to  optimum  in  the  sense  that  only  2 
particular  interchanges  would  suffice  to  obtain  an  optimum,  not  only  did  we  fail  to  improve  from  this 
state  but  instead  got  substantially  worse  results  for  the  higher  (melting)  temperatures.  Of  course, 
we  only  (I)  did  1 ,000  and  3,000  transitions  for  a  problem  in  which  5001  transitions  are  a  priori 
possible.  But  45  minutes  (CPU  time)  per  1 ,000  transitions  were  required  on  a  mainframe 
computer.  And,  "equilibrium"  seemed  to  be  attained  on  these  mns. 

Many  combinatorial  optimization  problems  can  be  stated  in  continuum  form  and  solved,at 
worst  approximately,  by  continuum  methods.  For  example,  integer  programming  problems  with 
network  constraints  and  integer  data  can  be  solved  exactly  with  extreme  point  algorithms  which 
are  also  two  orders  of  magnitude  more  efficient  in  time  and  size  accomodated  than  general 
purpose  LP  algorithms.  Christophides  algorithm  for  the  traveling  salesman  problem  based  on 
network  stmcture  can  guarantee  no  worse  than  150%  over  optimal  cost.  The  simulated  annealing 
procedures  can  guarantee  nothing. 

Another  point  exhibited  by  our  Pincus  examples  is  that  the  convergence  of  the 
integral  formulations  may  be  to  a  different  convex  combination  of  the  global  minima  from  that 
of  the  approximating  Riemann  sums.  The  latter,  \ria  our  Discrete  Case  Theorem,  is  always  to 
the  simple  average  of  the  global  minima.  Thus  the  interchange  of  limit  operations,  X  oo 

and  sum  -» integral  can  make  large  differences  in  the  fluctuation  of  states  visited  (and  how  often) 


within  a  sample  run.  This  point  applies  whether  or  not  the  optimum  is  unique  since  approximate 
non-giobal  optima  may  have  equal  values.  Sample  runs  often  converge,  see  Hammersley  and 
Handscomb  [10],  to  local  non-global  optima  (with  flip  flops)  since  transitions  to  better  states  may 
be  possible  only  through  (very  unlikely)  transitions  to  states  poorer  than  the  local  optima. 


Thus  simulated  annealing  computations  cannot  be  trusted  to  deliver  global  optima. 
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